Serveur d'exploration sur l'OCR

Attention, ce site est en cours de développement !
Attention, site généré par des moyens informatiques à partir de corpus bruts.
Les informations ne sont donc pas validées.

Book Search Experiments: Investigating IR Methods for the Indexing and Retrieval of Books

Identifieur interne : 000C65 ( Main/Exploration ); précédent : 000C64; suivant : 000C66

Book Search Experiments: Investigating IR Methods for the Indexing and Retrieval of Books

Auteurs : Hengzhi Wu [Royaume-Uni] ; Gabriella Kazai [Royaume-Uni] ; Michael Taylor [Royaume-Uni]

Source :

RBID : ISTEX:8C6184460C675464B92E71BB4961244B26710894

Abstract

Abstract: Through mass-digitization projects and with the use of OCR technologies, digitized books are becoming available on the Web and in digital libraries. The unprecedented scale of these efforts, the unique characteristics of the digitized material as well as the unexplored possibilities of user interactions make full-text book search an exciting area of information retrieval (IR) research. Emerging research questions include: How appropriate and effective are traditional IR models when applied to books? What book specific features (e.g., back-of-book index) should receive special attention during the indexing and retrieval processes? How can we tackle scalability? In order to answer such questions, we developed an experimental platform to facilitate rapid prototyping of a book search system as well as to support large-scale tests. Using this system, we performed experiments on a collection of 10 000 books, evaluating the efficiency of a novel multi-field inverted index and the effectiveness of the BM25F retrieval model adapted to books, using book-specific fields.

Url:
DOI: 10.1007/978-3-540-78646-7_23


Affiliations:


Links toward previous steps (curation, corpus...)


Le document en format XML

<record>
<TEI wicri:istexFullTextTei="biblStruct">
<teiHeader>
<fileDesc>
<titleStmt>
<title xml:lang="en">Book Search Experiments: Investigating IR Methods for the Indexing and Retrieval of Books</title>
<author>
<name sortKey="Wu, Hengzhi" sort="Wu, Hengzhi" uniqKey="Wu H" first="Hengzhi" last="Wu">Hengzhi Wu</name>
</author>
<author>
<name sortKey="Kazai, Gabriella" sort="Kazai, Gabriella" uniqKey="Kazai G" first="Gabriella" last="Kazai">Gabriella Kazai</name>
</author>
<author>
<name sortKey="Taylor, Michael" sort="Taylor, Michael" uniqKey="Taylor M" first="Michael" last="Taylor">Michael Taylor</name>
</author>
</titleStmt>
<publicationStmt>
<idno type="wicri:source">ISTEX</idno>
<idno type="RBID">ISTEX:8C6184460C675464B92E71BB4961244B26710894</idno>
<date when="2008" year="2008">2008</date>
<idno type="doi">10.1007/978-3-540-78646-7_23</idno>
<idno type="url">https://api.istex.fr/document/8C6184460C675464B92E71BB4961244B26710894/fulltext/pdf</idno>
<idno type="wicri:Area/Istex/Corpus">000255</idno>
<idno type="wicri:Area/Istex/Curation">000251</idno>
<idno type="wicri:Area/Istex/Checkpoint">000732</idno>
<idno type="wicri:doubleKey">0302-9743:2008:Wu H:book:search:experiments</idno>
<idno type="wicri:Area/Main/Merge">000C77</idno>
<idno type="wicri:Area/Main/Curation">000C65</idno>
<idno type="wicri:Area/Main/Exploration">000C65</idno>
</publicationStmt>
<sourceDesc>
<biblStruct>
<analytic>
<title level="a" type="main" xml:lang="en">Book Search Experiments: Investigating IR Methods for the Indexing and Retrieval of Books</title>
<author>
<name sortKey="Wu, Hengzhi" sort="Wu, Hengzhi" uniqKey="Wu H" first="Hengzhi" last="Wu">Hengzhi Wu</name>
<affiliation wicri:level="4">
<country xml:lang="fr">Royaume-Uni</country>
<wicri:regionArea>Department of Computer Science, Queen Mary,University of London</wicri:regionArea>
<placeName>
<settlement type="city">Londres</settlement>
<region type="country">Angleterre</region>
<region type="région" nuts="1">Grand Londres</region>
</placeName>
<orgName type="university">Université de Londres</orgName>
</affiliation>
<affiliation wicri:level="1">
<country wicri:rule="url">Royaume-Uni</country>
</affiliation>
</author>
<author>
<name sortKey="Kazai, Gabriella" sort="Kazai, Gabriella" uniqKey="Kazai G" first="Gabriella" last="Kazai">Gabriella Kazai</name>
<affiliation wicri:level="1">
<country xml:lang="fr">Royaume-Uni</country>
<wicri:regionArea>Microsoft Research, Cambridge</wicri:regionArea>
<wicri:noRegion>Cambridge</wicri:noRegion>
</affiliation>
<affiliation>
<wicri:noCountry code="no comma">E-mail: gabkaz@microsoft.com</wicri:noCountry>
</affiliation>
</author>
<author>
<name sortKey="Taylor, Michael" sort="Taylor, Michael" uniqKey="Taylor M" first="Michael" last="Taylor">Michael Taylor</name>
<affiliation wicri:level="1">
<country xml:lang="fr">Royaume-Uni</country>
<wicri:regionArea>Microsoft Research, Cambridge</wicri:regionArea>
<wicri:noRegion>Cambridge</wicri:noRegion>
</affiliation>
<affiliation>
<wicri:noCountry code="no comma">E-mail: mitaylor@microsoft.com</wicri:noCountry>
</affiliation>
</author>
</analytic>
<monogr></monogr>
<series>
<title level="s">Lecture Notes in Computer Science</title>
<imprint>
<date>2008</date>
</imprint>
<idno type="ISSN">0302-9743</idno>
<idno type="eISSN">1611-3349</idno>
<idno type="ISSN">0302-9743</idno>
</series>
<idno type="istex">8C6184460C675464B92E71BB4961244B26710894</idno>
<idno type="DOI">10.1007/978-3-540-78646-7_23</idno>
<idno type="ChapterID">23</idno>
<idno type="ChapterID">Chap23</idno>
</biblStruct>
</sourceDesc>
<seriesStmt>
<idno type="ISSN">0302-9743</idno>
</seriesStmt>
</fileDesc>
<profileDesc>
<textClass></textClass>
<langUsage>
<language ident="en">en</language>
</langUsage>
</profileDesc>
</teiHeader>
<front>
<div type="abstract" xml:lang="en">Abstract: Through mass-digitization projects and with the use of OCR technologies, digitized books are becoming available on the Web and in digital libraries. The unprecedented scale of these efforts, the unique characteristics of the digitized material as well as the unexplored possibilities of user interactions make full-text book search an exciting area of information retrieval (IR) research. Emerging research questions include: How appropriate and effective are traditional IR models when applied to books? What book specific features (e.g., back-of-book index) should receive special attention during the indexing and retrieval processes? How can we tackle scalability? In order to answer such questions, we developed an experimental platform to facilitate rapid prototyping of a book search system as well as to support large-scale tests. Using this system, we performed experiments on a collection of 10 000 books, evaluating the efficiency of a novel multi-field inverted index and the effectiveness of the BM25F retrieval model adapted to books, using book-specific fields.</div>
</front>
</TEI>
<affiliations>
<list>
<country>
<li>Royaume-Uni</li>
</country>
<region>
<li>Angleterre</li>
<li>Grand Londres</li>
</region>
<settlement>
<li>Londres</li>
</settlement>
<orgName>
<li>Université de Londres</li>
</orgName>
</list>
<tree>
<country name="Royaume-Uni">
<region name="Angleterre">
<name sortKey="Wu, Hengzhi" sort="Wu, Hengzhi" uniqKey="Wu H" first="Hengzhi" last="Wu">Hengzhi Wu</name>
</region>
<name sortKey="Kazai, Gabriella" sort="Kazai, Gabriella" uniqKey="Kazai G" first="Gabriella" last="Kazai">Gabriella Kazai</name>
<name sortKey="Taylor, Michael" sort="Taylor, Michael" uniqKey="Taylor M" first="Michael" last="Taylor">Michael Taylor</name>
<name sortKey="Wu, Hengzhi" sort="Wu, Hengzhi" uniqKey="Wu H" first="Hengzhi" last="Wu">Hengzhi Wu</name>
</country>
</tree>
</affiliations>
</record>

Pour manipuler ce document sous Unix (Dilib)

EXPLOR_STEP=$WICRI_ROOT/Ticri/CIDE/explor/OcrV1/Data/Main/Exploration
HfdSelect -h $EXPLOR_STEP/biblio.hfd -nk 000C65 | SxmlIndent | more

Ou

HfdSelect -h $EXPLOR_AREA/Data/Main/Exploration/biblio.hfd -nk 000C65 | SxmlIndent | more

Pour mettre un lien sur cette page dans le réseau Wicri

{{Explor lien
   |wiki=    Ticri/CIDE
   |area=    OcrV1
   |flux=    Main
   |étape=   Exploration
   |type=    RBID
   |clé=     ISTEX:8C6184460C675464B92E71BB4961244B26710894
   |texte=   Book Search Experiments: Investigating IR Methods for the Indexing and Retrieval of Books
}}

Wicri

This area was generated with Dilib version V0.6.32.
Data generation: Sat Nov 11 16:53:45 2017. Site generation: Mon Mar 11 23:15:16 2024